-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: add prompt caching support for Kimi K2 on Groq #7324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ported from upstream Cline repository PR #5697 Original PR: cline/cline#5697 - Added GroqUsage interface to handle cached token fields - Implemented proper cost calculation with cache read discounts - Enabled prompt caching for Kimi K2 model with 50% discount on cached tokens - Updated tests to verify caching functionality Co-authored-by: Cline Contributors <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the changes and found some issues that need attention before merging.
| // Calculate non-cached input tokens for proper reporting | ||
| const nonCachedInputTokens = Math.max(0, inputTokens - cacheReadTokens - cacheWriteTokens) | ||
|
|
||
| console.log("usage", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug logging should be removed from production code. Could we remove this console.log statement?
| import type { ApiHandlerOptions } from "../../shared/api" | ||
| import type { ApiHandlerCreateMessageMetadata } from "../index" | ||
| import { ApiStream } from "../transform/stream" | ||
| import { convertToOpenAiMessages } from "../transform/openai-format" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this import still needed? It appears to be unused since the createMessage method is overridden and doesn't call convertToOpenAiMessages.
| } | ||
|
|
||
| if (chunk.usage) { | ||
| yield* this.yieldUsage(chunk.usage as GroqUsage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add type validation here to ensure chunk.usage conforms to GroqUsage structure? The type assertion without validation could potentially cause runtime errors if the API response structure changes.
|
|
||
| const cacheReadTokens = usage?.prompt_tokens_details?.cached_tokens || 0 | ||
|
|
||
| // Groq does not track cache writes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we expand this comment to provide more context? For example: 'Groq does not track cache writes - only cache reads are reported in the API response. This is a limitation of the Groq API as of [date].'
| cacheReadTokens: 30, | ||
| }) | ||
| expect(typeof firstChunk.value.totalCost).toBe("number") | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding edge case tests:
- When prompt_tokens_details is present but cached_tokens is undefined
- When cached tokens exceed total prompt tokens (error case)
- Verify actual cost calculation values instead of just checking the type
Description
This PR ports the prompt caching support for Kimi K2 on Groq from the upstream Cline repository.
Ported from: cline/cline#5697
Changes
Implementation Details
Groq Handler
Model Configuration
Tests
Testing
✅ All tests passing (12/12)
✅ TypeScript compilation successful
✅ ESLint checks pass
Credits
This implementation is based on the original work from the Cline repository PR #5697.
Important
Adds prompt caching support for Kimi K2 on Groq with cost calculation and test updates.
moonshotai/kimi-k2-instructmodel with a 50% discount on cached input tokens ingroq.ts.GroqHandler.GroqUsageinterface ingroq.tsto handle cached token fields.createMessage()inGroqHandlerto yield usage data with cache details.yieldUsage()inGroqHandlerto calculate and yield usage costs.groq.spec.tsto verify caching functionality and cost calculations.This description was created by
for 8fa6f00. You can customize this summary. It will automatically update as commits are pushed.